Automatic Term Recognition based on Statistics of Compound Nouns

نویسنده

  • Hiroshi Nakagawa
چکیده

The NTCIR1 TMREC group called for participation of the term recognition task which is a part of NTCIR1 held in 1999. As an activity of TMREC, they have provided us with the test collection of the term recognition task. The goal of this task is to automatically recognize and extract terms from the text corpus which consists of 1,870 abstracts gathered from the NACSIS Academic Conference Database. This paper describes the term extraction method we have proposed to extract terms consisting of simple and compound nouns and the experimental evaluation of the proposed method with this NTCIR TMREC test collection. The basic idea of scoring a simple noun :N of our term extraction method is to count how many nouns are conjoined with N to make compound nouns. Then we extend this score to measure the score of compound nouns because most of technical terms are compound nouns. Our method has a parameter to tune the degree of preference either for longer compound nouns or for shorter compound nouns. As for term candidates, in addition to noun sequences, we may add variations such as patterns of “A NO B” that roughly means “B of A” or “A's B” and/or “A NA B” where “A NA” is an adjective. Experimental results of our method are promising, namely recall of 0.83, precision of 0.46 and F-value of 0.59 for exactly matched extracted terms when we take into account top scoring 16,000 extracted terms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Term Recognition based on Statistics of Compound Nouns and their Components

In this paper, we propose a new idea to enhance automatic recognition system for domain specific terms. Our idea is based on the statistics about the relation between a

متن کامل

Automatic Term Recognition based on Statistics of Compound Noun and its Components

In this paper, we propose a new idea of automatically recognizing domain specific terms. Our idea is based on the statistics between a compound noun and its component single-nouns. More precisely, we focus basically on how many nouns adjoin the noun in question to form compound nouns. We propose several scoring methods based on this idea and experimentally evaluate them on NTCIR1 TMREC test col...

متن کامل

A Simple But Powerful Automatic Term Extraction Method

In this paper, we propose a new idea for the automatic recognition of domain specific terms. Our idea is based on the statistics between a compound noun and its component single-nouns. More precisely, we focus basically on how many nouns adjoin the noun in question to form compound nouns. We propose several scoring methods based on this idea and experimentally evaluate them on the NTCIR1 TMREC ...

متن کامل

An Analysis of Persian‌ Compound Nouns as Constructions

In Construction Morphology (CM), a compound is treated as a construction at the word level with a systematic correlation between its form and meaning, in the sense that any change in the form is accompanied by a change in the meaning. Compound words are coined by compounding templates which are called abstract schemas in CM. These abstract constructional schemas generalize over sets of existing...

متن کامل

Detecting accent sandhi in Japanese using a superpositional F0 model

In this report, we propose a method for automatic prosodic structure recognition of Japanese utterances based on a superpositional F0 model, focusing particularly on the accent sandhi phonemenon in compound nouns. The method enables automatic labeling of F0 contours using the model, which can be useful for creating prosodic databases containing F0 contours in a parametric form. The prosodic str...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001